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REVISION HISTORY 


This manual is the APMATH64/MAX Manual, Volume 4, 869-7482-991. The 
letter shown under the revision number column indicates the portion of 
the part number that changes for each revision. The last entry is the 
latest revision to this manual. 


The revision history begins with this manual. 


Deleted Utilities Library, deleted the 
LPSPFI subroutine, added internal subroutine 
information, and added 16 new routines. 


Added new routines to Basic Math Library, 


Double Precision Library, and Matrix Algebra 
Accelerated Math Library. 12/87 


NOTE: For revised manuals, a vertical line "|" outside the left 
Margin of the text signifies where changes have been made. 


NOTE TO READER 


This is the fourth volume of the APMATH64 Manual. It 
is comprised of Appendix K, Appendix L, and a key 
word index for the APMATH64/MAX routines. Note that 
Appendix A continues through Volumes 1, 2, and 3. 
The page numbers are listed consecutively through the 
volumes. 


The APMATH64 Manual has three indices located at the 
end of Volume 3 and two at the end of Volume 4. The 
first index (Appendix I} is a list of the APMATH64 
routines in page order by type. The second index 
(Appendix J) is an alphabetical list of the APMATH64 
routines. The third index is a key word index of the 
APMATH64 routines. The fourth index (Appendix L) is 
an alphabetical list of the APMATH64/MAX routines. 
The fifth index is a key word index of the 
APMATH64/MAX routines. 
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MATRIX ALGEBRA ACCELERATOR MATH LIBRARY ROUTINES 


K.1 INTRODUCTION 


This appendix contains documentation for the Matrix Aigebra Accelerator 
(MAX) Math Library routines. A high-level description of the MAX 
system is presented, followed by documentation for the two categories 
of MAX Math Library routines: the Basic MAX routines and the Matrix 
Oriented MAX routines. 


s 


K.2 SYSTEM OVERVIEW 


A M64/149 or M64/145 MAX system consists of a Central Processing Unit 
(CPU), a memory unit, and one or more MAX modules. The MAX modules are 
collectively referred to as the "MAX". Each MAX module is capable of 
performing calculations in parallel with the M64/149 or M64/145 CPU as 
well as with other MAX modules. 


The MAX Math Library software requires an M64/14@ or M64/145 that has 
been configured with at least 16K Table Memory RAM (TMRAM). The 
software can utilize an arbitrary number of MAX modules up to and 
including the maximum configuration of fifteen. 


words in length. The Basic MAX routines are restricted to usage of the 
first 2K-l1 (2947) locations of each vector memory. The Matrix Oriented 
MAX routines restrict usage as appropriate to the particular operation 
(refer to the manual page). 


The hardware on a MAX module directly supports the basic vector 
operations real and complex dot product, real vector scalar multiply 
and add (VSMA), and real vector multiply and scalar add (VMSA). Since 
the dot product operation is a vector to scalar operation, up to eight 
Sséparate real dot products or four separate complex dot products can be 
performed on a single MAX module. That is, all eight vector memories 
can contain input vectors for dot products. Since the VSMA and VMSA 
operations are vector to vector operations, up to four separate VSMA's 
or VMSA's can be performed on a single MAX module. For VSMA and VMSA 
operations, the vector memories on each MAX module are grouped into two 
banks of four vector memories each. One bank of four vector memories 
contains the input vectors, while the other bank of four vector 
memories contains the resulting output vectors. 


The CPU supplements the MAX performance by using TMRAM as an additional 
set of vector memories. In this way, the CPU can perform up to four 
real dot products, two complex dot products, one VSMA, or one VMSA in 
parallel with the MAX modules. Most of the MAX Math Library routines 
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The ranges of values which will produce overflow or underflow during 
conversion from FPS to IEEE and back to FPS format are given below. 
This conversion could be obtained by a vector move of data from Main 
Memory to a MAX vector register followed by a vector move of the data 
from the MAX vector register back to Main Memory. Both the standard 
scientific notation and the FPS internal binary representation (in 
hexadecimal) is given for each number. 


8.898846567431157758E+398 --+ 
FFEF FFFF FFFF FFFF 


Positive overflow 


8.449423283715578989E+398 


fe es ee a 


FFE8 8998 GOLA GBGE 


( 
! 
+ 


G.44942328371557888GE+3 98 
FFCF FFFF FFFF FFFF 


No overflow/underflow 


9.222587385859729149E-397 


ie a a 


G968 BEGG GIBB BABG 


@ 2225873858597 2899GE-397 


! 
! 
+ 


@O4F FFFF FFFF FFFF 


Positive underflow ; 


9.27813423231349G17G9E-398 


GO88 GIGS BAGG TEBG 


' 
! 
+ 


G8 
True zero 
BOGS GGG GHGS Boos 


-9.278134232313408300E-388 --+ 
| 

Q017 FFFF FFFF FFFF | 

: | 

| 


Negative underflow 
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MAX formatter exception interrupts can be selectively enabled or 
disabled in a manner similar to the arithmetic exception interrupts. 
The default disables all MAX formatter exception interrupts. 


The following code fragment enables all MAX formatter exception 
interrupts: 


INTEGER UNFL, UNNORM, DENORM, OVFL 


e 


UNFL = 1 

UNNORM = 1 

DENORM = 1 

OVFL = 1 

CALL SYSSENA_FMTERR(UNFL, UNNORM, DENORM, OVFL) 


The following code fragment disables all MAX formatter exception 
interrupts. 


INTEGER UNFL, UNNORM, DENORM, OVFL 


UNFL = 1 
UNNORM = 
DENORM = 1 

OVFL = l 

CALL SYSSDIS_FMTERR(UNFL, UNNORM, DENORM, OVFL) 


For more information on enabling and disabiing exception interrupts, 
refer to the documentation for routines SYSSENAEXC, SYSSDISEXC, 
SYSSENA_FMTERR, SYSS$DIS_FMTERR in the Operating System Manual, Volume 
3, File and Memory Management, listed in Section 1.5. 


K.3 BASIC MAX ROUTINES 


The Basic MAX routines provide for basic functional utilization of the 
MAX system and are vector oriented. 


K.3.1 Overview 


The Basic MAX routines are designed to access the basic functionality 
of the MAX modules. These routines are not as flexible as the Matrix 
Oriented MAX routines in terms of data management in the MAX system. 


The configuration table used by the Basic MAX routines references the 
available MAX modules. The table is always placed into the 17 highest 
addressable locations in TMRAM regardless of the base address of the 
TMRAM workspace. Hence, for 16K TMRAM systems, the table will be 
situated in locations 24559-24575. 
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Mathematically, the operations to be performed are: 
S(i) = SUM(A(j) * B(i,j); j = 1,4]; i=1,14 


The following is the APFTN64 code to perform the desired operation. 
Briefly, the PLOADD routine is called to load the array of vectors B 
into TMRAM and the MAX vector memories. The PDOT routine is then 
called to compute the dot products and return the results into the 
array S. (Refer to the documentation on PLOADD and PDOT for a 
discussion of the parameter values.) 


REAL A(4),B(14,4),S(14) 
INTEGER I,J,N,M,ITMA, ISTART, IFUN, IERR 


I 14 
J=1 

N= 4 
M=14 

ITMA = 8192 
ISTART = 1 


' CALL PLOADD(B,1I,J,N,M, ITMA, ISTART, IERR) 
IF(IERR.NE.S9) GO TO 999 


I=l 

J=l 

N= 4 

M= 14 
ITMA = 8192 
ISTART = 1 
IFUN = @ 


CALL PDOT(A,1I,N,S,J,M,ITMA, ISTART, IFUN, IERR) 
IF(IERR.NE.%) GO TO 999 


998 CONTINUE 
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resulting output vectors and store them in the array of vectors C. 
(Refer to the documentation on PLOADV, PVSMA, and PUNLDV for a 
discussion of the parameter values.) 


REAL A(4),S(7),B(7,4)+C(7+4) 
INTEGER I,J,N,M,ITMA,ISTART, IFUN, IERR, IBANK 


I=7 
J=l 
N= 4 

= 7 
IBANK = @ 
ITMA = 8192 
ISTART = 1 


CALL PLOADV(B,I,J,N,M, IBANK, ITMA, ISTART, IERR) 
IF(IERR.NE.@) GO TO 989 


I L 
J 1 
N= 4 
M=7 


ITMA = 8192 


ISTART = 1 
IFUN = 8 


CALL PVSMA(A,1I,S,J,N,M, IBANK, ITMA, ISTART, IFUN, IERR) 
IF(IERR.NE.J) GO TO 999 


I=7 
J=z=l 
N= 4 
M=7 
ITMA = 8192 
ISTART = 1 


CALL PUNLDV(C,I,J,N,M, IBANK, ITMA, ISTART, IERR) 
IF(IERR.NE.9) GO TO 999 


999 CONTINUE 


Upon successful execution of PLOADV, PVSMA, and PUNLDV, the array C 
contains the following: 


C= $6 6.6 G8 G8 
9.9 9.8 9.7 G.6 
1.8 1.6 1.4 1.2 
26? 2.4 2.1 ‘1.8 
3.6 3.2 2.8 2.4 
4.5 4.6 3.5 3.9 
5.4 4.8 4.2 3.6 
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Basically, PLOADD is called to load M rows of the matrix A into TMRAM 
and the MAX vector memories. Each call to PDOT performs the M dot 
products of one column of the matrix B with the M rows of the matrix A 
loaded by PLOADD. These calculations are shown in Figure K-l. 


+ + + + + + 
| | | | | x | 
| x |xxx...xxx | | x | 
| x | i ea are oe ee | | x | 
| : | ee ee ne | | ; | 
| : Ga. A ae en ate ek ae we tee RL : | 
| : | ee Ce eae oe ee | | ‘ | 
| x | ee a ae cae te ae | x | 
| x | i tae a ae ee <a | | x | 
| | | | | x | 
+ + + + + + 
Cc A B 
Nl x N3 Nl x N2 N2 x N3 


Figure K-l Sample Calculations of Matrix Multiply 


In general, the number of rows of the matrix A (Nl above) is not an 
integral multiple of the number of dot products that a given system can 
perform at the same time (M above). It is straightforward to add 
APFTN64 code to handle the remaining MOD(N1,M) rows of the matrix A, as 
well as to handle the case where Nl is less than M. 


K.3.3 Routine Documentation 


This section contains the descriptions of the Basic MAX routines. 
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RREREREEEEE RRREAEREKEKKAE 
x & 7 & 
* PCDOT * ——— PARALLEL COMPLEX DOT PRODUCT -—- * pcpoT * 
& = & & 
REEEEEERERE EEEKEEKEEKEE 
PURPOSE: To compute the complex dot products of a single vector 


with each of a set of vectors that were loaded by PLDCD. 


CALL FORMAT: CALL PCDOT(A, I, N, S, J, M, ITMA, ISTART, IFUN, IERR) 


PARAMETERS : A = Floating-point input complex vector. 
I = Integer input element stride for A. 
N = Integer input number of elements in A. 
S = Floating-point output complex vector. 
J = Integer input element stride for S. 
Ma = Tntarear innit nuomnhar AF Vectors loaded her DENN 
va 2s Scyos tsiprurc 42UHLIG L Wd ie Vv Ww WS -2#UVAUTU a fe ddl old o 
I 


¥ 
TMA = Integer input TMRAM workspace base address 
from the most recent call to PLDCD. 

ISTART = Integer input starting index into the TMRAM 
workspace and the MAX vector memories to begin 
loading vectors. The first location of the MAX 
vector memories and the TMRAM workspace has an 
index value equal to one. 

IFUN = Integer input addition/subtraction flag. 

IFUN = 9: Use addition. 
IFUN <> 9: Use subtraction. 
IERR = Integer output error flag. 


DESCRIPTION: PCDOT performs the M complex dot products of the vector 
contained in A with the M vectors loaded by a previous 
call to PLDCD. There is no check as to whether or not 
the vectors were actually loaded by PLDCD. The results 
are stored in the vector S. 


S(J*(2j-1)) SUM(s * A(I*(2i-1),j) * 


W(i+tISTART-1),j); i = 1 to N] 


S(J*(2j-1)+1) = SUM({s * A(I*(2i),j3) * 
W(it+tISTART-1),j); i = 1 to N) 
j} = 1toM 
where 


s = 1.8 if IFUN = 9g 
-1.@ if IFUN <> g 


(0) 
uv 


and W(*,1:M) are the M vectors loaded by PLDCD. 


FPS 869~7482-991C Page K - 13 


- APPENDIX K 


MAX module #2 


Vector memory A: 6.1 6.3 6.5 
2: 6.2 6.4 6.6 
to Tek, Fed 725 
: 7.2 7.4 7.6 
: 8.1 8.3 8.5 
F: 8.2 8.4 8.6 
G: 9.1 9.3 9.5 
H: 9.2 9.4 9.6 
TMRAM 
T (8192): 1.1 
T™ (8193): 1.2 
T™ (8194): G.9 
T™ (8195): 9.9 
T (8196): 1.3 
TM (S157): 1.4 
T™ (8198): 9.2 
™ (8199): G.9 
T™ (8298): 1.5 
T (8291): 1.6 
T (8292): G.f 
T™ (8293): 9.9 
T (8294): x.x 
T™ (24558): x.x 
T™ (24559): MAX 
" Configuration 
TM (24575): Table 


Given the following input parameters to PCDOT: 


N 3 
M = 9 
L = 2 
J = 2 
ITMA = 8192 
ISTART = 1 
IFUN = g 


A = (1.9,8.8) (1.8,8.9) (1.9,8.8) 
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| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


PEE SEL EE SS 


RRRREKEEEK 


PURPOSE: 


CALL FORMAT : 


PARAMETERS : 


—— PARALLEL 2-D CONVOLUTION AND CORRELATION — 


APPENDIX K 


REKKEEREKR 
& = 
* PCNV2D * 
& x 
REKRKREKEEX 


To perform a 2-D convolution or correlation operation 


on two matrices using the M64/149 or M64/145. 


CALL PCNV2D(A,MA,IA,JA,M,N,B,MB,NB,IB1,C,MC,IC,JC,IR,ITMA) 


A = Floating-point input operand matrix. (column 
ordered) 

MA = Integer input number of rows of A 

IA = Integer input initial row of the submatrix A' 
of A to be processed (1 <= IA <= MA) 

JA = Integer input initial column of the submatrix 


A' of A to be processed (JA >= 1) 


M = Integer input number of rows in A' 
(1 <= M <= MA) 

N = Integer input number of columns in A’ 
(N >= 1) 

B = Floating-point input operator matrix. (column 
ordered) 

MB = Integer input number of rows of B 

NB = Integer input number of columns of B 

IBlL = Integer index of the operator element B that 


IR 


ITMA 


is to coincide with the first operand element 


of A' to be processed and output as the 


ieee 


corresponding element of C'. For correlation, 


this index is counted columnwise relative to 
the upper left-hand corner element of B. 

For convolution, this index is counted 
columnwise relative to the lower right-hand 
corner element of B, since B is reversed. 

(1 <= IB1 <= MB*NB) 

Floating-point output matrix. (column ordered) 


= Integer input number of rows of C 


Integer input initial row of C which locates the 


submatrix C' or C; C‘* will be the processed A’ 
(1 <= IC <= MC) 
Integer input initial column of C which locates 
the submatrix C' of C (JC >= 1) 
Integer input scalar flag 

IR = @: Perform convolution 

IR <> @: Perform correlation 
Integer input TMRAM workspace base address 


DESCRIPTION: C(ic,jc) = SUM[SUM[A(ka,la) * B(k,1); k=1,MB]; 1=1,NB] 


for 


i=l to M 
j=l to N 


where ic = i+tIC-l 
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09.8 G@.8 O.8 G.8 O.8 O.8 G8 O.8 GF. 
9.0 @.8 G8 6.8 GHB 8.8 G8 8.8 B.8 
6.9 5.8 11.9 23.8 28.8 28.08 -8.9 G.8 G.f 
5.9 4.09 19.8 19.8 24.8 208.0 -8.9 G.0 G.f 
5.9 4.8 18.9 19.8 24.9 29.0 -8.9 G.9 G.f 
6.8 5.8 11.8 23.8 28.8 28.8 -8.0 8.8 GO. 
~1.8 -1.8 -1.8 -4.0 -4.9 -8.9 @.8 8.8 G.8 
a. 
9.8 8.8 8.8 8.8 O.8 8.8 6.8 GH GB 
6.8 8.8 O68 8.8 G.8 O.6 8.8 FH Go 
0.8 O88 O68 G.8 GH G8 OF GH Ae 


IBl 
IR 
Ge 


ee ec ee ee ee ce ee ee 


Page K 
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If N is less than one, then 
S(j) = 9.8; j = 1, M 

Summary of error conditions: 
IERR = G Normal return. 


IERR = 1 Insufficient MAX vector memories and 
TMRAM to hold the number of vector 
designated by M. Each MAX module can 
hold eight vectors. TMRAM can hold four 
vectors. 


IERR = 2 Vector length too large. N+ISTART-1 
must be less than or equal to 2947. 


IERR = 3 ISTART or M is less than or equal to 
zero. Both of these parameters must 
be positive. 


IERR = 4 Insufficient TMRAM space available. 

There must be enough TMRAM available to 
hold 4*(N+ISTART-1) + 17 words, starting 
at the TMRAM workspace base address, 
ITMA. Although PDOT does not load TMRAM, 
it does check for consistency between N, 
ISTART, and ITMA. 


IERR = 5 ITMA less than 8192 ITMA must be 
~ a 
Ss QO 


e 
ewesalo ts 
Gua. to 


EXAMPLE: Given a system with two available MAX modules and 16K 
TMRAM that has been initialized by PLOADD as follows: 


MAX module #1 


Vector memory A: 
Bs 
Cs 
D: 
E: 
Fe 
G: 
He 2 


e 
e 
e 
s 


e e 

ee ee 
s 
e 


e 
Ww Ww Ww Ww Ww Ww Ww 


e 

e e 
NNN NNN NHN NH 

e 

e 


Qworn awn m W 
QwWoary nan m W 
Hoon es w 
4 Gees 
PP PP PP Pb 


MAX module #2 
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Upon RETURN from PDOT, S contains: 
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S(1+( j-1) *I+(k-1)*7) = (c*S(1+( j-1) *I+(k-1)*J) 
+ SUM[sS*A((i-1)*IA+1)*V(INDEX(i)+j-1,k); i=1,NA); k=1,M) 
j = 1,NP 
where 


1.9 if IAcc = 9 
c= @.@ if IAcC <> @ 


s = 1.8 if IFUN = 9g 
s = -1.9 if IFUN <> g 


and V(*,1:M) are the M vectors loaded by PILOAD. 
If NA is less than one, then 

S(i,j) = 3.8; i = 1 to M, j = 1 to NP 
Summary of error conditions: 


IERR = g Normal return. 


TERR 1 Insufficient MAX vector memories to hold 
the number of vectors designated by M. 


Each MAX module can hold 8 vectors. 


IERR = 2 NA, NP, or M is less than or equal to 
zero. Each of these parameters must be 


IERR = 3 Insufficient TMRAM space available. 
There must be enough TMRAM to hold 17 
words starting at the TMRAM workspace 
base address ITMA. 


IERR = 4 ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 


EXAMPLE: Given a system with one available MAX module and 16K TMRAM 
that has been initialized by PILOAD as follows: 
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RAKREEEREETE RREREREEKEER 
ed 2 & x 
* PILOAD * -——— PARALLEL LOAD FOR PIDOT -—— * PILOAD * 
& & 2 
REEEKERERAE ; REREREKREESA 
PURPOSE: To load vectors from Main Memory into the MAX vector 

mamme 2 an an ewer ewe i Am fae malln nm PIDpoT 

HSU SES £44 PLEVPaLavsayli LULL CF4440 YU Fi Lo 


CALL FORMAT: CALL PILOAD (A, I, J, N, M, ITMA, IPTR, IERR) 


PARAMETERS : A = Floating-point input array of vectors. 
Integer input element stride for vectors in A. 


te 
i 


J = Integer input element stride between 
vectors in A. 

N = Integer input number of elements per vector 
in A. 

M = Integer input number of vectors to transfer. 


ITMA = Integer input TMRAM workspace base address. 

IPTR = Integer input offset into the MAX vector 
memories to begin accessing vector elements. 
This parameter is different than the ISTART 
parameter in PLOADD. 

IERR = Integer output error flag. 


DESCRIPTION: PILOAD loads the vectors contained in A into the MAX 
vector memories in preparation for calls to PIDOT. 
MAX vector memory location zero is reserved for PIDOT, 
therefore requiring the IPTR parameter to be greater 
than zero. PILOAD also sets up the MAX configuration 
table in TMRAM. 


Summary of Error Conditions: 


g Normal return. 


IERR 


Insufficient number of MAX vector 
memories to hold the number of vectors 
designated by M. Each MAX module can 
hold 8 vectors. 


IERR 


u 
~ 


IERR = 2 Vector length too large. N+IPTR must be 
less than or equal to 2947. 


IERR = 3 One or more of IPTR, N, or M is less 


than or equal to zero. Each of these 
parameters must be positive. 
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MAX module #2 


Vector memory A: x.x 94 9.2 9.3 ee 
3 x.x 19.1 18.2 18.3 19.4 
Cs xix “died. le2 ded 11.4 
D: xeX° I2ek. L252. -12.3° 1234 
E: ¥ox -LScl. 13.2 1333. 1334 
F: x.x 14.1 14.2 14.3 14.4 
TMRAM 
T (8192): MAX 
‘ Configuration 
T™™ (8298): Table 
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Vector length too large. N+ISTART-1 must 
be less than or equal to 2947. 


One or more of ISTART, N, or M is less 
than or equal to zero. Each of these 
parameters must be positive. 


Insufficient TMRAM space available. 
There must be enough TMRAM available to 
hold 4*(N+ISTART-1) + 17 words, starting 
at the TMRAM workspace base address, 
ITMA. 


ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 


EXAMPLE: Given a system with two available MAX modules and 16K TMRAM 
and the following input parameters to PLDCD: 


now 
hm © 
ee) 


ISTART = 1 


A = 


(1.1,1.2) 
(2.1,2.2) 
(3.1,3.2) 
(4.1,4.2)} 
(5.1,5.2) 
(6.1,6.2) 
(7.1,7.2) 
(8.1,8.2) 
(9.1,9.2) 


(1.3,1.4) (1.5,1.6) 
(2.3,2.4) (2.5,2.6) 
(3.3,3.4) (3.5,3.6) 
(4.3,4.4) (4.5,4.6) 
(5.3,5.4) (5.5,5.6) 
(6.3,6.4) (6.5,6.6) 
(7.3,7.4) (7.5,7.6) 
(8.3,8.4) (8.5,8.6) 
(9.3,9.4) (9.5,9.6) 


Note that in this example, A is a two-dimensional complex 
matrix whose elements are stored in column major order. 
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The vectors are loaded into the MAX vector memories as real 
and imaginary pairs as well with the real part of the first 
number loaded into memory A and the imaginary part in vector 
vector memory B. The next number will be loaded into vector 
memories C (real part) and D (imaginary part), etc. 
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IERR 


IERR 


IERR 


IERR 


u 
N 
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Vector length too large. N+tISTART-1 
must be less than or equal to 2947. 


One or more of ISTART, N, or M is less 
than or equal to zero. Each of these 
parameters must be positive. 


Insufficient TMRAM space available. 
There must be enough TMRAM available to 
hold 4*(N+ISTART-1) + 17 words, starting 


at the TMRAM workspace base address, 
ITMA. 


ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 


EXAMPLE: Given a system with two available MAX modules and 16K TMRAM 
and the following 


N = 
M = 
I = 
J 
ITMA = 
ISTART = 
A= 1.1 
2.1 
Saks 
4.1 
Sek 
6.1 
7.1 
8.1 
eB 
ph Peek 
11.1 
BE apes 
L3el 
14.1 


Note that in 


NFM RW ONDA MW PWN EH 
e 
NONNN NN NN NN NHN DN 


~ 


te 


tm 


13.2 
14.2 


this 


NF Mwoarn DW fh WN FH 
e 
WWW WW W& WW Ww Ww WwW Ww 


e 
e 


ed 


PRwWON OD UW bm WN EF 
e 
Pr PP PP Pb PP Pb Pb bP 


re 


ed 
ad eel exlll xa 
wn 

e 

= 


1333 
14.3 14.4 


Py 


2 


example, A is a two-dimensional 


array whose elements are stored in column major order. 
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= FJ & 2 
* PLOADV * -~-~- PARALLEL LOAD FOR -—-—— * PLOADV * 
* bad PVSMA AND PVMSA * * 
RERAEEEEEEE ; RERERREREE 
PURPOSE: To load vectors from Main Memory into TMRAM and 


the MAX vector memories in preparation for calls 
to PVSMA or PVMSA. 


CALL FORMAT: CALL PLOADV(A, I, J, N, M, IBANK, ITMA, ISTART, IERR) 


PARAMETERS : A = Floating-point input array of vectors. 
I = Integer input element stride for vectors in A. 
J = Integer input element stride between 
vectors in A. 
N = Integer input number of elements per vector 
in A. 
M = Integer input number of vectors to transfer. 
IBANK = Integer input TMRAM region and bank of MAX 


vector memories to load. 
IBANK = 9%: Load first TMRAM region and MAX 
vector memories A,B,C,D. 
IBANK <> @: Load second TMRAM region and 
MAX vector memories E,F,G,H. 
ITMA = Integer input TMRAM workspace base address. 
ISTART Integer input starting index into the TMRAM 
region and the MAX vector memories to begin 
loading vector elements. The first location 
of the TMRAM region and the MAX vector 
memories has an index of one. 
IERR = Integer output error flag. 


DESCRIPTION: PLOADV loads the vectors contained in A into TMRAM 
and the MAX vector memories in preparation for calls 
to PVSMA or PVMSA. TMRAM is loaded first, so that if 
M equals 1, then only TMRAM is loaded. The remaining 
vectors are loaded into the MAX vector memories. 
PLOADV also sets up the MAX configuration table in 
the high end of TMRAM. The rest of the TMRAM 
workspace is partitioned into two regions. The first 
region corresponds functionally to MAX vector memories 
A,B,C,D, while the second region corresponds 
functionally to MAX vector memories E,F,G,H. 
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Note that in this example, A is a two-dimensional array 
whose elements are ‘stored in column major order. 
Upon RETURN from PLOADV, the MAX modules and TMRAM contain: 
MAX module #1 


Vector memory 


Ma a tw 


ee 608 6 6@e 688 


MAX module #2 


Vector memory E: 11.1 11.2 11.3 11.4 
Fs 13.1 13.2 13.3 13.4 


TMRAM 
T™ (8192): x.x First 
TMRAM 
7™M(16374): xx Region 


™(16375): 1.1 
T™(16376): 1.2 
T™ (16377): 1.3 Second 
TM(16378): 1.4 
TM(16379): x.x 


;: Region 


TMRAM 


T™(24558): x.x 


T™ (24559): MAX 
‘ Configuration 
T™ (24575): Table 
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EXAMPLE: 


INPUT: 


Let the input matrix A be: 


1.98 | 


-2.88 
1.99 
1.90 

-1.989 
1.99 
G.G9 
9.99 


LA 


OUTPUT: 
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-1.98 -1.99 
9 .9G 1.99 
~-1.99 .90 
1.68 i.gg 
1.98 -1.89 
-1.92 1.99 
1.99 1.09 
9.99 9.99 
8 
7 
Matrix A: 
G.9G 1.99 
-1.98 -G.59 
-1.98 -9.59 
1.99 -1.99 
“1.08 -@.25 
-1.99 -9.5d 
1.98 -9.59 


9.99 9.99 


2256767 


g 


1.99 
-1.089 
1.82 
G88 
-2.89 
1.90 
-1.89 
9599 


-1.99 
9.50 
1.98 

-1.96 
B.48 
GB. 88 
G.48 
9.99 


WBRRKAHNWMN RNR 
WRQNNQMNQM QQ 
\ 

XM 
a 


WREEF EH EB 
e 


1.98 
1.58 
2.88 
2.868 
8.59 
G.69 
-5.08 
9.99 


9.939 
9.99 
9.99 
9.99 
9.99 
3339 
9.99 
9.995 
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The pivot vector IPVT contains the row interchange 
information that was generated by PLUFAC while 


performing partial pivoting. 


EXAMPLE: 


INPUT: 


Let the factored input matrix A be: 


8.59 
-9.59 
8.58 
“8.58 
B.88 
~8.50 
-2.59 
2.99 


LA 


LX 


IPVT 


OUTPUT: 


@.88 
1.99 
-1.288 

1.96 
-1.96 
-1.969 

1.99 

9.99 


1.98 -1.88 
“G.58 8.59 
-$.58 1.98 
“1.98 -1.99 
-8.25 9.75 
-G.56 9.59 
“8.58 9.58 

9.99 9.99 


-1.989 
G.58 
1.88 

-1.98 
G.48 
G.89 
8.49 
9.99 


6.98 -3.808 9.986 


2256767 


Solution matrix X: 


-8.98 75.89 
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-6.99 56.90 


4.99 


-1.92 1.89 
%.58 1.59 
2.92 2.88 
1.69 2.88 
1.25 8.58 
2.98 9.69 
8.98 -5.98 
9.99 9.99 

-3.88 1.98 
-18.98 38.09 


Pag 


9.99 
9399 
9.99 
9499 
9.99 
9.99 
9.99 
9.99 


9.99 


eK - 


-1.09° 9.99 


43 


EXAMPLE: 


where 


s = +1.9 ( when IFUN = @ 
g 


TMRAM is used by this routine to hold up to 4 vectors of 
length MIN(NA,2947), and also to hold a compressed MAX 
module configuration table of up to 17 words. The 
routine checks the amount of TMRAM on the system and 
returns an error code (see Summary of Error Conditions 
below) if the above requirement is not met. 


Implementation Note: 

Vector lengths are not restricted to the length of the 
MAX vector memories. When the vector length exceeds this 
length (i.e., NA > 2947), partial dot products are 
calculated in the MAX and TMRAM and are accumulated on 
the M64/149 or M64/145. 

Summary of Error Conditions: 


IERR = J Normal return. 


IERR 


i) 
he 


One or more of LA, LB, LC, MC, NC, or 
NA is less than or equal to zero. 
Each of these parameters must be 
positive. 


IERR = 2 ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 


IERR = 3 Insufficient TMRAM scratch space. See 
DESCRIPTION section for details of TMRAM 
requirements. 

INPUT: 
A: te8 -heOr eo 1 
2.9 2.0 2.80 2.0 
3.0 3.0 3.8 3.8 
4.9 4.6 4.86 4.9 
5.9 5.8 5.8 5.0 
G.86 8.89 8.86 G.f 
B: 189 2.0 3.0 4.8 
1.8 2.8 3.8 4.8 
1.9 2.8 3.0 4.8 
1809 2.60 3.0 4.9 
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EREREEEEAEE 
. 7 & 
* PMOVE * 
* . 
REREEEEERER 
PURPOSE: 


CALL FORMAT: 


PARAMETERS : 


DESCRIPTION: 


APPENDIX K 


REKKKEARKEKREE 
* *« 
———- PARALLEL MOVE ——- * PMOVE * 
& *® 
REREKEARREKE 


To move a row of elements across the MAX modules and 
i e element in TM. 


CALL PMOVE(A, IROW1, IROW2, ISTEP, N, IBANK, ITMA, IERR) 


A = Floating-point output matrix. 

IROWl = Integer input element row index. 

IROW2 = Integer input element row index. 

ISTEP = Integer input element stride for A. 

N = Integer input number of elements to move. 

IBANK = Integer input bank switch (@ or 1). 

ITMA = Integer input base address of the TMRAM 
workspace. 

IERR = Integer output error flag. 


PSWAP moves the element in TM indexed by IROW2 out to 
Main Memory (matrix A) and moves the element indexed by 
IROW1] to the location indexed by IROW2. The elements 
indexed by IROW2 in the MAX vector memories are also 
moved out to Main Memory (matrix A) and the elements 
indexed by IROW1] across the MAX vector memories are moved 
to the location indexed by IROW2. 


If IBANK = 9, then only elements in vector memories A, B, 
C, and D of the MAX vector memories are moved. If 

IBANK = 1 then only elements in vector memories E, F, G, 
and H are moved. 

This routine can be used in matrix factorization. 


Summary of error conditions: 


g Normal return. 


IERR 


IERR = 1 Insufficient MAX vector memories and 
TMRAM to hold the number of elements 
designated by N. Each MAX module 
can hold four vectors. TMRAM can hold 
one vector. 


IERR 


" 
w 


One or more of IROW1, IROW2, or N is 
less than or equal to zero. Each of 
these parameters must be positive. 


IERR 


u 
wm 


ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 
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G$.8 8.8 G89 G8 OCH GH OH GH GH 


$86 $8 G8 O86 $8 BH 88 GH G.8 


9.8 689 6.89 G8 GH G8 GH G8 GG 


8.8 6.8 6.9 $8 G8 O68 G8 GH GF. 


G8 G€89 G86 G8 G86 GH G8 GH G.G 


G8 68 G68 G8 O68 GH GH G.H 


G. 


Upon RETURN from PMOVE, A contains: 


9.9 @.89 @.89 8.8 G.h 


G.8 
G8 6.80 8.89 G8 G8 G86 G8 G8 G.8 


6.8 680 6.08 G8 G.6 G68 G8 O.8 Of 


$09 G8 68 G8 O88 O88 OCH GF G.G 


5.1 


5.3 5.4 5.5 5.6 5.7 5.8 5.9 


5.2 


$8 @.6 6.9 O68 8.8 6.89 O89 O.8 GB 


TMRAM contains 


: Sock 


TM (8194) 
TM (8195) 
T (8196) 
TM (8197) 


7 MAX 


TM(24559) 


3 Table 


T1( 24575) 


The MAX vector memories contain 


MAX Module #1 


4.2 2.2 6.2 


3.2 


= 1.2 2.2 


A 


Vector Memory 


6.5 


3.5 4.5 2.5 


Led 295 


D= 


MAX module #2 


4.6 2.6 6.6 


3.6 


= 1.6 2.6 


A 


Vector Memory 


49 
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INFO=K indicates that the K-th diagonal element 
became 9.9. This does not cause the routine to 
return, but indicates that a divide by @ occurs 
if the matrix-solving routine SGESL is called 
with this output. If more that one diagonal 
element becomes 9.9, then INFO is set to the last 
one to do so. 


The original input matrix A is overwritten by 
the factored matrix. 


For further information, see the LINPACK routine 
by the same name. Dongarra, J.J., Bunch, J.R., 
Moler, C.B., and Stewart, G.W. LINPACK User's 
Guide. Society for Industrial and Applied 
Mathematics, Philadelphia, Pa., 1979 

Pa., 1979. 


EXAMPLE: 
INPUT: 
Let the input matrix A be: 
1.989 -1.08 -1.909 1.88 1.88 1.98 1.98 9.99 
-2.98 9.808 1.08 -1.08 -1.098 -1.908 1.88 9.99 
1.08 -1.908 9.909 1.98 1.908 1.068 =$§j.1.88 9.99 
1.98 1.908 1.808 8.98 1.88 1.08 -1.88 9.99 
“1.69 1.08 -1.98 -2.98 6.98 1.08 1.88 9.99 
1.609 1.06 1.068 1.08 -1.008 6.68 1.88 93.995 
2.98 1.868 1.98 -1.908 1.098 1.88 8.88 9.99 
9.99 9.99 9.99 9.99 9.99 9.99 9.99 9.99 
LDA = 8 
N =7 
OUTPUT: 


Output matrix A: 


-2.08 9.00 1.98 -1.98 -1.90 -1.99 1.98 9.99 
G.58 -1.98 -9.58 96.58 9.58 9.58 1.589 9.99 
G@.58 -1.08 -2.90 -1.98 1.98 2.909 2.98 9.99 
G.58 1.08 9.58 -1.88 -1.08 1.909 2.98 9.99 

—8.50 1.08 8.58 -@.58 2.58 1.25 6.58 9.99 
G.58 -1.08 1.08 -@.58 -9.809 9.58 6.68 9.99 

0.98 1.88 8.25 -0.75 -8.49 9.08 -@.28 9.99 

9.99 9.9 


a aa a aa a a a a a 
Je Ge FTF « Pa $.99 $.99S 


Af 
+S) 


IPVT = 2256767 


INFO = @ 


FPS 868-7482-991C Page K - 51 


APPENDIX K 


In terms of the input parameters, the computation 
performed by PTSLVK can be described by the following 
equation, where V(I,J) represents element I in the J-th 
vector memory: 


V(IEND,J) { V(IEND,J) + s*T(1)*V(IBEG,J) + 
s*T(1+1I)*V(IBEGt1,J) + ... + 


T(I*(IEND-IBEG-1)+1)*V(IEND-1,J) ] * Q 


where S = +1.8 (wnen IFUN = 9g) 
= -1.9 (when IFUN <> 9Q) 
and Q=1.9 (if IUD = @) 


= T(I*(IEND-IBEG)+1) (if IUD <> @) 


J ranges from 1 to the number of vectors available. For 
example, with N MAX modules, 8*N+4 values of V(IEND,J) 
would be computed on each call to PTSLVK. 


Implementation note: 


When the user's data structure is in the form of a 2-D 
FORTRAN array, the higher level routine PTSOLV should be 
called to solve for the entire solution matrix X with a 
Single call. PTSLVK may also be called directly by the 
user to handle matrix data structures that are not in 
this form. 


Summary of error conditions: 
IERR = @ Normal return. 


IERR = 1 Insufficient MAX vector memories and 
TMRAM to hold the number of vectors 
designated by NV. Each MAX module can 
hold eight vectors. TMRAM can hold four 
vectors. 


IERR = 2 Invalid vector specification. 
(IEND-IBEG) must be greater than or equal 
to zero, and less than 2948. 


IERR = 3 IBEG or IEND M is less than or equal to 
zero. Both of these parameters must 
be positive. 


IERR = 4 Insufficient TMRAM space available. 
There must be enough TMRAM available 
to hold 4*(IEND-IBEG+1) + 17 words, 
starting at the TMRAM workspace base 
address, ITMA. Although PTSLVK does not 
load TMRAM, it does check for consistency 
between IEND, IBEG, and ITMA. 
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T™ (24559): MAX 
é Configuration 
T™ (24575): Table 


Given the following input parameters to PTSLVK: 


ni 
IBEG = 
IEND = 


a a ae 


ITMA = 8192 


T= 1.9 1.80 1.8 -1.9 


MAX module #1 . 
Element Number 4 


Vector memory A: 6.2 
B: 8.2 
Cs 19.2 
D: 12.2 
E: 14.2 
Es 16.2 
G: 18.2 
H: 28.2 
MAX module #2 
Element Number 4 
Vector memory A: 28.2 
B: 22.2 
C: 24.2 
D: 26.2 


TMRAM 


T (8284): 2.2 
TM (8295): 4.2 
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DESCRIPTION: PTSOLV should be used to solve for the unknown matrix X 
in the matrix equations TX = B or XT = B (where T is 
upper or lower triangular), whenever the matrices are 
in the form of a 2-D FORTRAN array. For other data 
structures, routine PTSLVK can be used in conjunction 
with PLOADD and PUNLDD. 


The matrix B is first loaded into the MAX and TMRAM 


vector memories, and is overwritten and reused as the 
forward elimination or hack substitution process 


an WS te Seats Se eee onke we ewes we ee ee ee ow wwe we 


continues (termed growing the solution matrix). 


In terms of the input parameters, the computation 
performed by PTSOLV can be described by the equations 
below, which applies to solving TX = B when T is lower 
triangular. V(1I,J) represents element I in the J-th 
vector memory. Also, T(I,I) has been reciprocated when 
IUD<>9. 


V(I,J) = { V(I,d) + 
s*T(I,1)*V(1,0) + s*T(I,2)*V(2,J) + ... + 
s*T(I,I-1)*V(I-1,J) ] * Q 


where s = +19 (when IFUN = @) 
= -1.9 (when IFUN <> @) 
and Q=l (if IUD = @) 
= T(I,I1) (if IUD <> @) 
and I = Lye ZpuaceeN 
JT =1,2,...,(8*NMAXM+4)}, 


where NMAXM = number of MAX modules available. When 
the solution is completed, matrix V is copied from MAX 
and TMRAM memory to the solution matrix X. 


Summary of error conditions: 
IERR = 9G Normal return. 
IERR = 1 One or more of N, LT, LB, or LX is less 


than or equal to zero. Each of these 
Parameters must be positive. 


IERR = 2 ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 
IERR = 3 Insufficient TMRAM space available. 


There must be enough TMRAM available to 
hold 4*N + 17 words, starting at the 


~ ese 


TMRAM workspace base address, ITMA. 


FPS 869-7482-981C Page K - 57 


APPENDIX K 


kERRKEKRER keKKRREREE 
* *& * * 
* PUNLDD * -—-- PARALLEL UNLOAD FOR PTSLVK -—— * PUNLDD * 
® * ® * 
RREKKEEKREE RERRRREKEE 
PURPOSE: To unload a set of vectors from TMRAM and the MAX 


yector memories after calls to PTSLVK. 


CALL FORMAT: CALL PUNLDD (A, I, J, N, M, ITMA, ISTART, IERR) 


PARAMETERS: A = Floating-point output matrix into which 
TMRAM and the MAX vector memories are unloaded. 
I = Integer input element stride for vectors in A. 
J = Integer input element stride between 
vectors in A. 
N = Integer input number of elements per vector. 
M = Integer input number of vectors to ‘inicaa 
ITMA = Integer input base address of the TMRAM 
workspace. 
Integer input starting index into the TMRAM 
workspace and the MAX vector memories to begin 
loading vectors. The first location of the 
TMRAM workspace and the MAX vector memories has 
an index value equal to one. 
IERR = Integer output error flag. See “Summary of 
Error Conditions” below. 


ISTART 


DESCRIPTION: This routine performs the reverse of routine PLOADD, 
unloading the vectors contained in TMRAM and the MAX 
vector memories into A. Vectors are unloaded with the 
constraints that TMRAM is unloaded first, and the MAX 
is unloaded with multiples of four vectors. PUNLDD 
assumes that PLOADD. has set up the MAX configuration 


table in the high end of TMRAM. 
Summary of Error Conditions: 


IERR = Gg Normal return. 


IERR si Insufficient TMRAM and MAX vector 
memories to hold the number of vectors 
designated by M. Each MAX module can 
hold eight vectors. TMRAM can hold four 


vectors. 


IERR = 2 Vector length too large. N+ISTART-1 must 
be less than or equal to 2@47. 


IERR = 3 One or more of ISTART, N, or M is less 


than or equal to zero. Each of these 
parameters must be positive. 
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1 
T (82985): 2.4 
T (8296): xx.x 


T™(24558): xx.x 


T™ (24559): MAX 
Configuration 
T™ (24575): Table 


Given the following input parameters to PUNLDD: 


Upon RETURN from PUNLDD, Main Memory contains: 


As ade 1.2 163. <1 
2olL 262 2.3 2.4 
Sek. B02. 323° 3:4 
4.1 4.2 4.3 4.4 
Sel. Se2: Sad 524 
6.1 6.2 6.3 6.4 
Tel. “Ted Te3. 7.4 
8.1 8.2 8.3 8.4 
921°. Fez Bed 94 

19.1 18.2 18.3 19.4 
Liesl. di.2 11.3: 11.4 
L2eb 12.2 12.3 12.4 
13.1 13.2 13.3 13.4 
14.1 14.2 14.3 14.4 


Note that in this example, A is a two-dimensional array 
whose elements are stored in column major order. 
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IERR =1 Insufficient MAX vector memories and 
TMRAM to hold the number of vectors 
designated by M. Each MAX module can 
hold four vectors. TMRAM can hold one 
vector. 


IERR = 2 Vector length too large. N+ISTART-1 
must be less than or equal to 2947. 


IEBRR = 3 One or more of ISTART, N, or M is less 
than or equal to zero. Each of these 
parameters must be positive. 

IERR = 4 Insufficient TMRAM space available. 


There must be enough TMRAM available 

to hold 2*(N+ISTART-1) + 17 words, 
starting at the TMRAM workspace base 
address, ITMA. Although PUNLDV does not 
load TMRAM, it does check for 
consistency between N, ISTART, and ITMA. 


IERR = 5 ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 
EXAMPLE: Given a system with two available MAX modules and 16K TMRAM 


that contains the following: 


MAX module #1 


Vector memory A: 9.5 8.8 @.7 §.6 
B: 1.8 1.6 1.4 1.2 
Cs 2.7 2:4 241. 1.8 
De 3.6 352 2.8 2.4 
MAX module #2 
Vector memory A: 4.5 4.8 3.5 3.9 
Bs 5.4 4.8 4.2 3.6 


TMRAM 


TM (8192): 9.8 
T™ (8193): 9.9 
T™ (8194): 9.9 First 
T™ (8195): 9.9 
TM (8196): x.x 


‘ Region 


TMRAM 


™(16374): x.x 
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& 2 2 & 
* PVMSA * —— PARALLEL VMSA -—— * PVMSA * 
*& . & ; & 
REKEREEREEE REEEREREKRE 
PURPOSE: To compute the vector multiply and scalar add (VMSA) of 


a singie vector with a set of vectors residing in 
TMRAM and the MAX vector memories. 


CALL FORMAT: CALL PVMSA(B, K, S, J, N, M, IBANK, ITMA, ISTART, 
IFUN, IERR) 
PARAMETERS : = Floating-point input vector. 
= Integer input element stride for B. 
= Floating-point input array of scalars. 
Integer input element stride for S. 
= Integer input number of elements per vector. 
= Integer input number of VMSA's to perform. 
Integer input TMRAM region and bank of MAX 
vector memories containing the input set of 
vectors. Integer output TMRAM region and 
bank of MAX vector memories containing the 
output set of vectors. 
IBANK = 9: Reference the first TMRAM 
region and MAX vector 
memories A,B,C,D. 
IBANK <> @: Reference the second TMRAM 
region and MAX vector memories 
E,F,G,H. 
ITMA = Integer input TMRAM workspace base address 
from the most recent call to PLOADV. 
Integer input starting index into the 
TMRAM region and the MAX vector memories 
to begin accessing/storing vector elements. 
The first location of the TMRAM region and 
the MAX vector memories has an index of one. 
IFUN = Integer input addition/subtraction flag. 
IFUN = @: Use addition. 
IFUN <> 9: Use subtraction. 
TERR = Integer output error flag. 


a AAaR 
u 


IBANK 


ISTART 


DESCRIPTION: PVMSA computes the M VMSA's of the vector B with 
the set of M vectors residing in TMRAM and the MAX 
vector memories, using the elements of S as the M 
scalar values. The results are stored in the 
other region of TMRAM and bank of MAX vector 
memories as indicated by toggiing the vaiue of 
IBANK. 
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EXAMPLE: Given a system with two available MAX modules and 16K TMRAM 
that contains the following: 
MAX module #1 
Vector memory E: 3.1 
F: 5.1 
G: 7.1 
Hz: 9.1 
MAX module #2 


Vector memory E: 11.1 11.2 11.3 11.4 
Fe 2361 13.2. 13.3 13.4 


TMRAM 
TM (8192): x.x First 
TMRAM 

1M(16374): xx Region 


T™(16375): 1.1 
TM(16376): 1.2 
T(16377): 1.3 Second 
TM(16378): 1.4 
T™(16379): x.x 


. Region 


TMRAM 


T™(24558): x.x 


T(24559): MAX 
. Configuration 
TM(24575): Table 


Given the following input parameters to PVMSA: 
B= “Lal. ~le2- “1,3 —1.4 


K=1 


NAM WN 
RMRMWMRMRN MN WN 
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T™ (24559): MAX 
. Configuration 
™ (24575): Table 
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IF(IBANK .EQ. 9) IBANK = 1 
ELSE IBANK = g 
V(ISTART+i,j) = W(ISTART+i,j) +s*B(itl)*S(j); 


i = g,N-1l; 
j 1,M 


¢ 
where 


1.0 if IFUN = @g 
s = -1.9 if IFUN <> g 


i) 
I 


and W(*,1:M) are the input set of vectors and 
V(*,1:M) are the output set of vectors. 


PUNLDV is called by the user to retrieve the 
resulting vectors V(*,1:M), when appropriate. 


Summary of error conditions: 
IERR = 9g Normal return. 


IERR = 1 Insufficient MAX vector memories and 
TMRAM to hold the number of vectors 
designated by M. Each MAX module can 
hold four vectors. TMRAM can hold one 
vector. 


- N+ISTART-1 


= 1 
uat LY 4 


IERR = 2 Vector length too large 
must be less than or eq 
IERR = 3 One or more of ISTART, N, or M is less 
than or equal to zero. Each of these 


parameters must be positive. 


IERR = 4 Insufficient TMRAM space available. 
There must be enough TMRAM available 
to hold 2*(N+ISTART-1) + 17 words, 
starting at the TMRAM workspace base 
address, ITMA. Although PVSMA does not 
load TMRAM, it does check for 
consistency between N, ISTART, and ITMA. 


IERR = 5 ITMA less than 8192. ITMA must be 
greater than or equal to 8192. 
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: 


819 
ISTART 
IFUN = 


Upon RETURN from PVSMA, IBANK is equal to zero, 


1 
4 
7 
1 
2 
a 
g 


and the MAX modules and TMRAM contain: 


MAX module #1 


Vector memor 


MAX module #2 


Vector memor 


TMRAM 
T (8192): 
TM (8193): 
TM (8194): 


-TM (8195): 
TM (8196): 


T( 16374): 


TM (16375): 
TM(16376): 
TM( 16377): 
TM(16378): 
TM(16379): 


TM(24558): 
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y As 
Bs: 
Cs 
D: 


y As 


X.X 


4.5 4.8 3.5 
5.4 4.8 4.2 


3.8 
3.6 


ilsl Lis2 Tl.2 21.4 
13«1 13.2 13.3 13.4 


First 


TMRAM 


Partition 


Second 


TMRAM 


Partition 
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K.4 MATRIX ORIENTED MAX ROUTINES 


The Matrix Oriented MAX routines emphasize matrix processing and 
provide for expanded utilization of the MAX system. 


K.4.1 Overview 


The Matrix Oriented MAX routines were written to enhance the basic 
functionality of the MAX modules. These routines are more flexible 
than the Basic MAX routines in terms of data management in the MAX 
system. They are also more efficient in performing matrix-matrix 


operations, particularly for small matrices. 


There are two major differences between the Matrix Oriented MAX 
routines and the Basic MAX routines, resulting in increased performance 
and improved user interfaces. First, the Matrix Oriented MAX routines 
accept submatrices as input. Second, they present a familiar, 
consistent model of the MAX vector memories. 


A minor difference between the Basic MAX routines and the Matrix 
Oriented MAX routines is that the Matrix Oriented MAX routines check 
for the correct number of parameters and exit with no action if the 
check fails. If the parameter check succeeds, the IERR flag will be 
set to a code upon exit from the routine. An incorrect number of 
parameters can be detected by setting IERR to an unused error code, 
e.g., 999, before calling a Matrix Oriented MAX routine. If IERR is 

- unchanged when the routine exits, then there are an incorrect number of 
parameters. 

The configuration table used by the Matrix Criented MAX routines 
used to reference the available MAX modules. Because the Matrix 
Oriented MAX routines permit operations with individual vector 
memories, the configuration table is more extensive than the one used 
by the Basic MAX routines. The table is always placed into the 254 
highest addressable locations in TMRAM regardless of the base address 
of the TMRAM workspace. Hence, for 16K TMRAM systems, the table will 
be situated in locations 24322-24575. Similarly, for 32K TMRAM 
systems, the table will be situated in locations 49796-49959. The rest 
of the TMRAM workspace is used to store input/output vectors. 


foe 
w 


K.4.1.1 Submatrix Input 


All floating-point data passed to the Matrix Oriented MAX routines can 
be organized as submatrices. This results in improved performance for 
small matrices. With a single call to a Matrix Oriented MAX routine, 
computations can be performed on an entire submatrix. 

rix is defined by a starting element, an intra-vector 

ide, an inter-vector element stride, the number of elements 
per vector, and the number of vectors. A single vector is a degenerate 
case of a submatrix, where the number of vectors is one. A single 
element is also a degenerate submatrix, where both the number of 
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If VECMEM is interpreted as a matrix of column vectors, then IVM is the 
starting row, IVN is the starting column, NEV is the number of rows, 
and NVB is the number of columns as shown in Figure K-2. 


$a a a pn a npn a $$ t - 


| 

re (rr! oe (ee a ey a re 2 | 
Pat ese ed! OF oP Ge ab de of Se cE 
lxIlxItxItxtl | | | | tT 
Bae a te ee ee A 
eo ee ee ee ee ee ( ( 
txfxItxitxf ¢-— — | | [| 4 
lx} x{fxIt xt || -| tot | [| <-= IVM+NEv-1- 
lf | -~— | bt toto to ot tl 
ee re ee: Ce (ee a a a, A | 
es es Pe 
Cr Ae ee! ces) a) (| 
fh oe fds a ce. STi I Sidi. 

| | 
IVN IVN+NVB-L 


Figure K-2 MAX Vector Memory Submatrix 


There are several advantages to using this model with the Matrix 
Oriented MAX routines. 


e The user is insulated from the hardware details of the machine 


and can use familiar concepts to visualize the operations; 
thus the routines are easy to use. 


All the routines permit operations with a single element, a 
Single vector, or a submatrix in the vector memories; thus 
they are general. 


In most cases, results from one routine can be used as input 
to another routine without moving data in the vector memories; 
thus the data management is consistent. 
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K.4.2.1 Real Vector Mapping 


A set of reali vectors is mapped one-to-one to the vector memories. The 
first four vectors are mapped to the A, B, C, D vector memories of the 
lowest addressed MAX module. The second four vectors are mapped to the 
A, B, C, D vector memories of the next lowest addressed MAX module and 
so on, until there are no more available MAX modules. The next two 
vectors are mapped to the A and B vector memories simulated in TMRAM. 
The next four vectors are mapped to the E, F, G, H vector memories of 
the lowest addressed MAX module. The next four vectors are mapped to 
the E, F, G, H vector memories of the next lowest addressed MAX module 
and so on, until as before, there are no more available MAX modules. 
The last two vectors are mapped to the C and D vector memories 
Simulated in TMRAM. 
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The vector memories simulated in TMRAM begin at the address ITMA, and 
are interleaved as shown in Pigure K-3. 


TMRAM Address 
$e ----- + 
| AQ) | ITMA 
-<----e-e- + 
| Bl) | ITMA+1 
$—--------—- + 
| el) | ITMA+2 
$e + 
| Dl) | ITMA+3 
+ eet erential + 
| A(2) | ITMA+4 
$—--------—-— + 
| B(2) | ITMA+5 
tran + 
| c(2) | ITMA+6 
+---------~ + 
| D2) | ITMA+7 
+-—-------- + 


Figure K-3 ‘TMRAM Simulated Vector Memories 


K.4.2.2 Complex Vector Mapping 


Mapping a set of complex vectors to the MAX vector memories is similar 
to the real vector mapping, except that a pair of vector memories is 
needed for each complex vector. The real part of each complex vector 
is mapped to the first vector memory of a pair. The imaginary part of 
a complex vector is mapped to the second vector memory of a pair. 


The first two vectors in the set are mapped to the A, B, C, D vector 
memories of the lowest addressed MAX module. The second two vectors 
are mapped to the A, B, C, D vector memories of the next lowest 
addressed MAX module, and so on, until there are no more available MAX 
modules. The next vector is mapped to the A and B vector memories 
simulated in TMRAM. The next two vectors are mapped to the E, F, G, H 
vector memories of the lowest addressed MAX module. The next two 
vectors are mapped to the E, F, G, H vector memories of the next lowest 
addressed MAX module, and so on, until as before, there are no more 
available MAX modules. The last vector is mapped to the C and D vector 
memories simulated in TMRAM. 
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K.4.3 Examples 


The following examples all involve real data. The applicable complex 
counterparts are Similar. The major difference between the complex 
cases and the real is that the complex case has one-half the available 
vector memories as the real case. 


Assume for the purpose of these examples that: 


a = Slain MAY man 
4S AVALLAAYLOE VINA iu 
r 


s a 
number of vector memories is given by 
NVEC = 8 * 1+ 4 ==12 
As in Section K.4.1.2, the vector memories will be modeled as 
a matrix of column vectors represented by a FORTRAN matrix 
VECMEM declared as follows: 
REAL VECMEM( 2948 ,NVEC) 
Note again that VECMEM is never actually declared in any of 
the code and is used solely to model operations performed with 
the MAX vector memories. 
@ The matrix B, denoted by [B], is declared as 


REAL B (3, 4) 


and is initialized as follows: 


n 
DB 


@ The matrix A, denoted by [A], is declared as 
REAL A (2, 3) 
and is initialized as follows: 


A: 11.1 11.2 11.3 
L2ed. 12.2. 1223 


@ The matrix C, denoted by [C], is declared as 
REAL C (2, 4) 


The examples in Sections K.4.3.1 through K.4.3.6 illustrate how various 
data structures can be moved to and from the MAX vector memories. The 


3 ] 3..9 A A 29 Tlevah bea ha Aa oroducts 
examples in Sections K.4.3.7 and K.4.3.8 illustrate how dot products 


can be performed between various data structures. The examples in 
Sections K.4.3.9 and K.4.3.1@8 illustrate how VSMA's can be performed 
between various data structures. 
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K.4.3.1 Matrix Load 


{B] can be loaded into the vector memories by a single call to MLOAD. 
If input parameters to MLOAD are set up as follows: 


NVB = 
ITMA = 


3 
1) 
Oh WRF WE- 


Ke 
ww 
N 


then the sequence 


IERR = 999 
CALL MLOAD (B, IBE, IBV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. 8) THEN 
WRITE (6, 1988) IERR 
1999 FORMAT (1X, 'MLOAD ERROR = ',I4) 
ENDIF 


will load [B] by columns into the vector starting at the first row and 
first column of VECMEM, and will also check to make sure that MLOAD did 
not detect an error. After this sequence the vector memories will 
contain the following: 


MAX vector memories (VECMEM): 


Bede digd:> bass UGS? ee. Re KRG ORE ORG eK Rak “ex 
tele Dee ‘2adi- Ze) Kae RR ee OR eR. ek RSX 
Sek. O62° 323° Sed Re OER ORK OS EK KKK. CKEK “XEX 
Mok. “ex RR RS Kae eS OK. UR OKO RS ORS OGY 
Me Mem Se. eM ce EK SM eee RK ORR XeX 
i a o> a ea eo) a > a eo ne <r a> o> <> a o> a <> came ae 4 
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K.4.3.3 Single Column Load 


A single column of [B] can be loaded into a vector memory by a single 
call to MLOAD. To load column 2 into vector memory 3, the parameters 
are set up as follows: 


IBE = 1 
IBV 3 
IVM =1 
IVN 3 
NEV = 3 
NVB =1 
ITMA = 8192 
The sequence 
IERR = 999 


CALL MLOAD (B(1,2), IBE, IBV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. @) THEN 
WRITE (6, 1998) IERR 
1888 FORMAT (1X, ‘MLOAD ERROR = ',1I4) 
ENDIF 


leaves the vector memories as follows: 


MAX vector memories (VECMEM): 
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K.4.3.5 Single Element Load 


A single element of [B] can be loaded into an element of a vector 
memory by a call to MLOAD. To load B(1,2) into the vector memories at 
row 4 and column 3, the parameters are set up as follows: 


iH 
oO 
ta 

it} 


4 
r 
z 
i 
Orr Ww & FF 


rar 
ive) 
N 


ITMA = 
The sequence 


IERR = 999 
CALL MLOAD (B(1,2), IBE, IBV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. @) THEN 
WRITE (6, 19899) IERR 
198g FORMAT (1X, 'MLOAD ERROR = ',I4) 
ENDIF 


leaves the vector memories as follows: 


MAX vector memories (VECMEM): 


Kak (Sek. Kak ORS XE Re ORR KX CRAG Ka CR 
Nak Keke “MSE RX SKE a KR OR Re ORR Re Se 
Kok Kok “Rex: SSK: TRS: eke (Kew Kak. KA KR “Sek ee 
Rew ORS. cles”: Rex KER Re: UKw See eK ORS Se «Ce 
Rex: Mex ek “Kee. Kak CK Rae Rak aw NE ee. Ke 
Kee Meh ORK: Rak REM eK KER: Max eK eK Kae ox 
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K.4.3.7 Matrix Multiplication: [A]*[B] (Dot Products) 


To illustrate the flexibility of the Matrix Oriented MAX routines, two 
examples of matrix multiplication using dot product operations are 
given below. 


Matrix multiplication using dot products can be performed using the MAX 
in two different ways: either load [B] into the vector memories or 
load [A] into the vector memories. The preferred method depends on the 
dimensions of the matrices. 


The matrix loaded should maximize three parameters: reuse, vector 
length, and fit. Reuse is the number of times each element of data 
loaded into a vector memory is used in subsequent computations. Vector 
length is the number of elements used per vector operation. Fit is a 
measure of the average number of vector memories used per vector 
operation. 


Note that by setting the strides and counts appropriately, [A]T*[(B] or 
(A]*(B]JT or [A]T*(B]T can also be performed. 


The first example loads columns of [B] into the vector memories. The 
sequence 


ITMA = 8192 
IERR = 999 
CALL MLOAD (B, IBE, IBV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,1998) IERR 
190g FORMAT (1X, 'MLOAD ERROR = ',1I4) 


3 
| 
CO mm WR kr WW Fe 


ELSE 
IAE = 2 
IAV =1 
ICE = 2 
Icv =1 
NVA = 2 
IFUN = 9G 
IERR = 999 


CALL MDOT (A, IAE, IAV, IVM, IVN, C, ICE, ICV, 
NEV, NVA, NVB, IFUN, IERR) 
IF (IERR .NE. 9) THEN. 
WRITE (6,1991) IERR 
19g1 FORMAT (1X, ‘MDOT ERROR = ',14) 
ENDIF 
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K.4.3.8 Vector Dot Product 


By restricting the vector count of one of the matrices to one, the dot 
products between that vector and the other matrix can be calculated. 
This mode of operation is similar to the functionality of the Basic MAX 
routines. This example performs the dot products between the second 
row of [A] and all the columns of [B]. Note that by simply changing 
the element strides for [A], a column of [A] could be used instead. 

The sequence 


IBE =1 
IBV = 3 
IVM =l 
IVN = 1 
NEV = 3 
NVB = 4 
ITMA = 8192 
IERR = 999 


CALL MLOAD (B, IBE, IBV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,1999) IERR 
1999 FORMAT (1X, 'MLOAD ERROR = ',1[4) 
ELSE 
IAE = 


j 


A) 
h 
YOWORHFKYPNEFHN 


: 


2 TeyaT 


(A(2,1), IAE, IAV, IVM, IVN, C, ICE, ICV, 
NEV, NVA, NVB, IFUN, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,1991) IERR 
1891 FORMAT (1X, ‘MDOT ERROR = ',14) 
ENDIF 
ENDIF 


99 
TT 


writes the results into [C] as follows: 


C: 77.86 8@.72 84.38 88.94 
XX X.X X.X X.X 
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The first example clears the appropriate submatrix of the vector 
memories, uses [B] as the scalars, and columns of [A] as the vectors 
for VSMA operations. 


The sequence 


IVM =1 
IVN 1 
NEV = 2 
NVB = 4 
ITMA = 8192 
IERR = 999 


CALL MLOAD (9.9, 9%, @, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. @) THEN 
WRITE (6,19898) IERR 


18GB FORMAT (1X, 'MLOAD ERROR = ',T4) 
ELSE 
IAB = 1 
IAV = 2 
IBE = 3 
IBV =l1 
NVA = 3 
IFUN = @ 
TERR = 999 


CALL MVSMA (A, IAE, IAV, IVM, IVN, B, IBE, IBV, 
NEV, NVA, NVB, IFUN, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,12891) IERR 


1991 FORMAT (1X, 'MVSMA ERROR = ',14) 
ELSE 
ICE = 1 
ICV = 2 
IERR = 999 


CALL MUNLD (IVM, IVN, C, ICE, ICV, NEV, NVB, IERR) 
IF (IERR .NE. %) THEN 
WRITE (6,12992) IERR 
1992 FORMAT (1X, 'MUNLD ERROR = ',T4) 
ENDIF 
ENDIF 
ENDIF 


writes the results into [C] as follows: 


C: 78.76 74.12 77.48 88.84 
77.96 88.72 84.38 88.44 


Notice the call to MVSMA changes the value of IVN, meaning that the 
column of VECMEM containing the results is not the same as the column 
of VECMEM containing the inputs. Since MUNLD can unload from any 
column, it receives the modified IVN value for copying results out to 
C. 
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K.4.3.18 Vector VSMA's 


Just as the dot product routines can be used to compute vector-matrix 
dot products, the VSMA/VMSA routines can also be used to compute 
VSMA/VMSA vector-matrix operations. This example uses the second 
column of [A] as the scalars, the third row of [B] as the vector, and 
{C] as the matrix. Assume that [C] is initialized as follows: 


5 
6 
The sequence 


ICE = 
ICV = 
IVM = 
IVN = 
NEV = 
NVB = 4 
ITMA = 8192 
IERR = 999 
CALL MLOAD (C, ICE, ICV, IVM, IVN, NEV, NVB, ITMA, IERR) 
IF (IERR .NE. @) THEN 

WRITE (6,19998) IERR 


NRF Ne 


1899 FORMAT (1X, 'MLOAD ERROR = ',14) 
ELSE 
IAE = 1 
IAV = 2 
IBE = 3 
IBV =1 
NVA = 1 
IFUN = @ 
IERR = 999 


CALL MVSMA (A(1,2), IAE, IAV, IVM, IVN, B(3,1), IBE, IBV, 
NEV, NVA, NVB, IFUN, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,1881) IERR 


1981 FORMAT (1X, 'MVSMA ERROR = ',14) 
ELSE 
IERR = 999 


CALL MUNLD (IVM, IVN, C, ICE, ICV, NEV, NVB, IERR) 
IF (IERR .NE. 9) THEN 
WRITE (6,1982) IERR 
1982 FORMAT (1X, ‘MUNLD ERROR = ',14) 
ENDIF 
ENDIF 
ENDIF 


writes the results into [C] as follows: 


C: 39.82 41.84 42.26 43.48 
43.92 45.24 46.56 47.88 
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MATRIX ORIENTED MAX ROUTINES 
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associated with A and B. The operation of CMDOT can 

be conveniently described if the set of MAX vector 
memories is considered to be a complex matrix VECMEM, 
and A and C are also considered to be complex matrices. 
Using this matrix convention, CMDOT performs the matrix 
multiplication 


C = q*cC + gr * A * VECMEM 
where TAR, IAV;,; IVM, IVN, ICE; ICV, NEV. NVA, and NVB 
allow the user to select subsets of A, VECMEM, and C 


as appropriate. 


To illustrate the flexibility of the data structures 
that can be associated with the data, suppose A and C 
are one-dimensional complex arrays, and that VECMEM is 
a complex matrix VECMEM(2948,4*NMAX+2) where NMAX is 
the number of available MAX modules. In this case, the 
computations performed by CMDOT can be described in 
FORTRAN by: 


DO 38 i = 1, NVA 
DO 29 j = 1, NVB 
TEMP = CMPLX(9.9,9.9) 
DO 19k = 1, NEV 


TEMP = TEMP + fc * A((i-1)*IAV+(k-1)*IAE+1) * 
2 VECMEM ( IVM+k~-1,IVN+j-1) 
1g CONTINUE 
C((i-1)*ICE+(j-1)*ICV+1l) = TEMP + 
+ q * C((i-1L) *ICE+( j-1)*iCv+l) 


Oe ed 


2g 
38 CONTINUE 
Care should be taken if A and C overlap. There are no 
checks that ensure the integrity of A is maintained, 
so values of A could be overwritten before they are 
used in subsequent calculations. 
Summary of error conditions: 

IERR = -3 NVA <= @. 

IERR = -2 NVB <= g. 


IERR = -l NEV <= 9. C is cleared when IFUN = 9g 
; or IFUN = l. 


IERR = @ No error occurred. Normal completion. 
TERR = 1 No TMRAM on the system. 

IERR = 4 IVM <= @. 

IERR = 5 IVM + NEV - 1 greater than 2947. 
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IVM = 3 

IVN = 2 

NVB = 5 

Input C 
(1.8,1.2) 
(2.8,2.8) 
(3.9,3.8) 
{4.8,4.8) 
(5.8,5.8) 
(1.9,1.9) 
(1.8,1.9) 
(2.8,2.8) 
(3.9,3.8) 
(4.9,4.@) 
(1.9,1.8) 
(1.8,1.8) 
(1.9,1.9) 
(2.9,2.8) 
(3.8,3.f) 

ICE = 2 

Icv = 18 

NEV = 4 

IFUN = 3 

IERR = 999 

Upon return 
( 5.2, -23.9) 
( 7.8, -25.98) 
(19.8, -32.f) 
(14.9, -46.9) 
(19.2, -69.9) 
( 5.2, -39.9) 
( 6.9, ~46.@) 
( 9.8, -61.9) 
(13.9, -87.9) 
(18.0,-126.9) 
( 5.8, -55.8) 
( 6.8, -66.9) 
( 8.9, -99.9) 
(12.8,-128.9) 
(17.9,-183.92) 

IERR = 9 
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To illustrate the flexibility of the data structures 
that can be associated with the data, suppose B is 

a one-dimensional complex array, and that VECMEM is 
a complex matrix VECMEM(2948,4*NMAX+2) where NMAX is 
the number of available MAX modules. In this case, 
the operations performed by CMLOAD can be described 
in FORTRAN by: 


DO 28 i = 1, NVB 
DO 19 j = 1, NEV 
VECMEM(IVM+j~1,IVN+i-1) = B((i-1)*IBV+(j-1)*IBE+1) 
1g CONTINUE 
28 CONTINUE 


Summary of error conditions: 
IERR = -2 NVB <= @. 


TERR = -] NEV <= @ 


: 


No error occurred. Normal completion. 

IERR = 1 No TMRAM on the system. 

IERR = 2 ITMA <= 8191. 

IERR = 3 ITMA > LASTTM-8192-254-4* ( IVM+NEV-1) 
where LASTTM is the highest valid TMRAM 
address. Refer to Section K.2. 

IERR = 4 IVM <= @. 

IERR = 5 IVM + NEV - 1 greater than 2948. 

IERR = 6 . IVN <= @. 

IERR = 7 IVN + NVB - 1 greater than NMAX*4 + 2 

where NMAX is the number of available 


MAX modules. Refer to Section K.2. 


If there are too many or too few formal parameters, then 
IERR is left unchanged. 


For more information on the Matrix Oriented MAX 
routines, refer to Section K.4. 
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PURPOSE: 


CALL FORMAT: 


PARAMETERS : 


DESCRIPTION: 
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RREKEAEEREE 
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———- COMPLEX MATRIX UNLOAD —— * CMUNLD * 
®& * 
REREERELER 


To unload a matrix of complex vectors from the MAX vector 
memories into Main Memory. 


CALL CMUNLD (IVM, IVN, B, IBE, IBV, NEV, NVB, IERR) 


IVM = Integer input starting element in the MAX 
vector memories. 

IVN = Integer input starting vector in the MAX 
vector memories. 


B = Floating-point output matrix of complex 
vectors in Main Memory. 
IBE = Integer input element stride for vectors in B. 
IBV = Integer input element stride between 
vectors in B. 
NEV = Integer input number of complex elements 
per vector. 
NVB = Integer input number of complex vectors. 


IERR = Integer input/output error flag. See 
DESCRIPTION for a list of error conditions. 


CMUNLD unloads the NVB vectors contained in the MAX 
vector memories to Main Memory defined by B, IBE, IBV, 
and NEV. An arbitrary data structure can be associated 
with B. The operation of CMUNLD can be conveniently 
described if the set of MAX vector memories is considered 
to be a matrix VECMEM and B is also considered to be a 
matrix. Using this matrix convention, CMUNLD performs 
the matrix transfer 
B = VECMEM 
where IBE, IBV, IVM, IVN, NVB, and NEV allow the user to 
select subsets of B and VECMEM as appropriate. 
To illustrate the flexibility of the data structures that 
can be associated with the data, suppose B is a one- 
dimensional array, and that VECMEM is a matrix 
VECMEM(2948,8*NMAX+4) where NMAX is the number of 
available MAX modules. In this case, the operations 
performed by CMUNLD can be described in FORTRAN by: 
DO 26 i= 1, NVB 

DO 19 j = 1, NEV 

B( (i-1)*IBV+(j-1)*IBE+1) = VECMEM(IVM+j-1,IVN+i-1) 
19 CONTINUE 
29 CONTINUE 


Summary of error conditions: 
IERR = -2 NVB <= @. 
IERR = -l NEV <= @. 
IERR = 9g No error occurred. Normal completion. 
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{* CMVSMA * ——-—— COMPLEX VECTOR MULTIPLY SCALAR ADD ---—- * CMVSMA * 
|* & £ £ 
| #eaaaeeees kReEkKKKKKEK 
PURPOSE: To perform complex vector scalar multiply add (CVSMA) 


| operations between a matrix of vectors in Main Memory, 
| a matrix of vectors in the MAX vector memories, and 
| a matrix of scalars in Main Memory. 


|CALL FORMAT: CALL CMVSMA (A, IAE, IAV, IVM, IVN, C, ICE, ICV, 
| NEV, NVA, NVB, IFUN, IERR) 


PARAMETERS: A Floating-point input matrix of complex vectors 


in Main Memory. 


IAE = Integer input element stride for vectors in A. 
IAV = Integer input element stride between vectors 
in A. 


IVM = Integer input starting element in the MAX 
vector memories. 

IVN = Integer input/output starting vector for the 
input/output vectors in the MAX vector memories. 


Cc = Floating-point input matrix of scalars in Main 
Memory. 

ICE = Integer input element stride for C. 

ICV = Integer input element stride between vectors 

NEV = Integer input CVSMA length. 

NVA = Integer input number of vectors in A to be used. 

NVB = Integer input number of vectors in the MAX 


vector memories to be used as input. Also 
the number of scalars per vector of C to be used. 
Integer input function flag. 
IFUN = @: r= 1.9 
IFUN = 1l: er = -1.9 
Note that IFUN is a bit-mapped function flag: 
IFUN = 2 is equivalent to IFUN = 9, IFUN = 3 
is-equivalent to IFUN = l, etc. 
See DESCRIPTION for the usage of r. 
IERR = Integer input/output error flag. 
See DESCRIPTION for a list of error conditions. 


IFUN 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| in C. 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


| DESCRIPTION: CMVSMA performs complex VSMA operations of length NEV 

| between the NVA vectors in Main Memory defined by A, IAE, 
| and IAV with the NVB vectors in the MAX vector memories 
defined by IVM and IVN, using the elements of C in Main 

| Memory defined by ICE and ICV as the scale factors. The 
| output of the VSMA operations is written into the MAX 

| vector memories. 

| A system with NMAX available MAX modules has 

| NVEC = 4 * NMAX + 2 
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1g CONTINUE 

28 CONTINUE 

38 CONTINUE 

Summary of error 
IERR = -3 
IERR = -2 
IERR = -1l 
IERR = 1l 
IERR = 9g 
IERR = 4 
IERR = 5 
IERR = 6 
IERR = 7 

If there are too 


conditions: 


NVA <= 8. 
NVB <= @. 
NEV <= @. C is cleared. 


No TMRAM on the system. 

No error occurred. Normal completion. 
IVM <= @. 

IVM + NEV - 1 greater than 2948. 

IVN <= @. 

IVN + NVB - 1 greater than NMAX*4. 

many or too few formal parameters then 


IERR is unchanged and no other action is taken. 
For more information on the Matrix Oriented MAX routines, 
refer to Section K.4. 


EXAMPLE: Assume one available MAX module and that the data in 
A and C is stored in column major order (normal FORTRAN). 
Perform the complex VSMA operations of the rows of a 
submatrix of A with a subset of the MAX vector memories, 
using the scalars contained in the columns of a submatrix 


of C. 
Input matrix 


(1.9,1.9) 
(X.X,X.X) 
(2.9,2.8) 
(22%, % 6X) 
CX XXX) 
(XX fXaX) 


IAE = 12 
IAV = 4 
NVA = 2 
Input matrix 


(1.9,98.9) 
(X.X%,X.X) 
(X.X,X.X) 
(3.9,8.8) 
(X.X¥,X-X) 
(XX Xe X) 
(X.X,X.X) 


ICE = 6 
Icv = 28 


A: 
(1.9,1.98) 
(X.X,X.X) 
(1.9,1.8) 
(X.X,X-X) 
(X.XpX.X) 
(XUxXeX.S) 


Ce: 

(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X65 7% X) 
(X.X,X.X) 


(1.9,1.9) 
(X.X,X.X) 
(1.9,1.2) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 


(1.8,1.9) 
(X.XpXe xX) 
(1.9,1.9) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 


(2.8,9.8) 
(X.X,X.X) 
(X.X-eX-X) 
(1.9,9.8) 
(X.X-,X.X) 
(X.X,X.X) 
(X.X,X.X) 


(H.X¥ XoR) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(xX.X,X.X) 


MAX vector memories (VECMEM): 


(X.X,X.X) 
(X.X,X.X) 
(1.9,1.f) 
(1.9,1.2) 
(1.9,1.2) 
(1.9,1.9) 
C(XsXp Xe X) 
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(X.X,X.X) 
(X.X%,X.X) 
(2.9,2.8) 
(1.09,1.8) 
(1.9,1.9) 
(1.9,1.f) 
(X.X,X.X) 


(X.X¥,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X-,X.X) 


(X.X,X.X) 
(X.X,X.X) 
(x.X-X.X) 
(XeXpXsX) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 


(X.X,X.X) 
(XeXe%s X) 
(X.X,X.X) 
(XeX Xe X) 
(X.X,X.X) 
(X.X¥,X.X) 
(XX XX) 
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(x.X,X.X) 
(x.X,X.X) 
(xX.X,X.X) 
(x.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 
(X.X,X.X) 


ee ib 


RRRKREKEEE 
* * 
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HREKEKRERE 


PURPOSE: 
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REREREKEELR 
“* bi 
—-- MATRIX DOT PRODUCT -—— * mpoT * 
x . 7 
RRKEEREREE 


To perform dot products between a matrix of vectors 


in Main Memory and a matrix of vectors in the MAX 
vector memories. 


CALL FORMAT: CALL 


PARAMETERS : A 


IERR 


DESCRIPTION: 


MDOT(A, 


IAE, 
NVA, NVB, 


IAV, IVM, IVN, 
IFUN, IERR) 


C, ICE, ICV, NEV, 


Floating-point input matrix of vectors in 
Main Memory. 
Integer input 
Integer input 
vectors in A. 
Integer input starting element in the MAX 
vector memories. 

Integer input starting vector in the MAX 
vector memories. 

Floating-point output matrix of results. 


element stride for vectors in A. 
element stride between 


Integer input element stride for vectors in C. 
Integer input element stride between 

vectors in C. 

Integer input dot product length. 

Integer input number of vectors in A to be used. 
Integer input number of vectors in the MAX 


vector memories to be used. 
Integer input function flag 


IFUN = @: r= 1.9, q= 9.9 
IFUN = 1: r=-1.9, q= G.8 
IFUN = 2: r= 1.8, q 1.9 
IFUN = 3: r=-1.9, q= 1.8 
Note that IFUN is a bit-mapped function flag: 
IFUN = 4 is equivalent to IFUN = 9; IFUN = 5 
is equivalent to IFUN = l, etc. 


See DESCRIPTION for the usage of q and r. 
Integer input/output error flag. 
See DESCRIPTION for a list of error conditions. 


MDOT performs dot products of length NEV between the 


NVA vectors in Main Memory defined by A, IAE, and IAV 
with the NVB vectors in the MAX vector memories defined 


by IVM and IVN. 


The NVA*NVB results are written to 


locations in Main Memory defined by C, ICE, and ICV. 


Specifically, for each vector in A, MDOT performs 
the NVB dot products between that vector and the NVB 


vectors in the MAX vector memories, 
NVB results into C. 
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IERR = 7 IVN + NVB - 1 greater than NMAX*8 + 4 
where NMAX is the number of available 
MAX modules. Refer to Section K.2. 


If there are too many or too few formal parameters, then 
IERR is left unchanged. 


For more information on the Matrix Oriented MAX 
routines, refer to Appendix K.4. 


EXAMPLE: Assume one available MAX module, that the MAX vector 
memories (VECMEM) have been loaded by MLOAD as indicated, 
and that the data in A is stored in column major order 
(normal FORTRAN). Multiply the transpose of a submatrix of 
A by a subset of the MAX vector memories, negate the 
result and accumulate to C. 


Input matrix A: 


dal Mex 08% “3. “Xek Bx “See Kix eax 
Rieke “Mw. a KR OO: eo LX 
2:8 Mik Bex 4.0 %.% X.X% 6.9 Kix x*.x 
xix MEM’ Mew. KK Re RS ee eee Oke 
2.0 x.x x.x 4.8 x.x x.x 6.89 xX.X X.X 
Kako Mek Re Rex. US Rak SGx Rex ek 
368 £e% Rex 550 Xek KEK TB Rex Kex 
Mak. RS LOK: RR eR Re eee | Re eX 
3.86 x.x x.x 5.8 x.x x.x 7.0 x.x xX.x 
KX XM. XX XX “NK XX XK XX XLX 
4.9 x.x x.x 6.89 x.x x.x 8.8 x.x x.x 
Mak. Ke: VS RE URS. MR RR Ree 
4.9 x.x x.x 6.8 x.x x.x 8.8 x.x x.x 
KeX Ms KER eK RK KE. ORK RU RX 
5.89 x.x x.x 7.80 x«x.x x.x 9.8 x.x x.xX 
MGM ORS See RRS wR OX RR RAK 

IAE = 2 

IAV = 48 

NVA = 3 
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Upon return from MDOT, C contains: 


-23.0 
-25.0 
~32.8 
-46.9 
69.0 
-39.8 
-46.8 
-61.9 
-87.8 
-126.9 
-55.9 
~66.9 
-99 8 
-128.9 
-183.9 
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VECMEM(2948,8*NMAX+4) where NMAX is the number of 
available MAX modules. In this case, the operations 
performed by MLOAD can be described in FORTRAN by: 


DO 29 i = 1, NVB 
DO 18 j = 1, NEV 
VECMEM(IVM+j-1,IVN+i-l1) = B((i-1)*IBV+(j-1)*IBE+1) 
18 CONTINUE 


29 CONTINUE 


Summary of error conditions: 


TERR m2 NVB <= @. 

IERR = -1l NEV <= g. 

IERR = @ No error occurred. Normal completion. 

IERR = 1 No TMRAM on the system. 

IERR = 2 ITMA <= 8191. 

IERR = 3 ITMA > LASTTM-8192~254-4* ( IVM+NEV-1 ) 
where LASTTM is the highest valid TMRAM 
address. Refer to Section K.2. 

IERR = 4 IVM <= @. 

IERR = 5 IVM + NEV - 1 greater than 2948. 

IERR = 6 IVN <= @. 

IERR = 7 IVN + NVB - 1 greater than NMAX*8 + 4 
where NMAX is the number of available 


MAX modules. Refer to Section K.2. 


If there are too many or too few formal parameters, then 
IERR is left unchanged. 


For more information on the Matrix Oriented MAX 
routines, refer to Section K.4. 


EXAMPLE: Assume one available MAX module and that the data in 
B is stored in column major order (normal FORTRAN). 
Load the rows of a submatrix of B into a subset of the MAX 
vector memories. 
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REERERETERE REREREREEE 
> 2 & & 
* MUNLD * ——— MATRIX UNLOAD —— * MUNLD * 
& x & * 
REREEEEEEE REAEKARKRAEE 
PURPOSE: To unload a matrix of vectors from the MAX vector 


memories into Main Memory. 
CALL FORMAT: CALL MUNLD(IVM, IVN, B, IBE, IBV, NEV, NVB, IERR) 


PARAMETERS : IVM = Integer input starting element in the MAX 
vector memories. 

IVN = Integer input starting vector in the MAX 
vector memories. 


B = Floating-point output matrix of vectors in 
Main Memory. 
IBE = Integer input element stride for vectors in B. 
IBV = Integer input element stride between 
vectors in B. 
NEV = Integer input number of elements per vector. 
NVB = Integer input number of vectors. 


IERR = Integer input/output error flag. See 
DESCRIPTION for a list of error conditions. 


DESCRIPTION: MUNLD unloads the NVB vectors contained in the MAX 
vector memories to Main Memory defined by B, IBE, IBV, 
and NEV. An arbitrary data structure can be associated 
with B. The operation of MUNLD can be conveniently 
described if the set of MAX vector memories is considered 
to be a matrix VECMEM, and, B is also considered to be a 
Matrix. Using this matrix convention, MLOAD performs 
the matrix transfer 


B = VECMEM 


where IBE, IBV, IVM, IVN, NVB, and NEV allow the user 
to select subsets of B and VECMEM as appropriate. 


To illustrate the flexibility of the data structures 
that can be associated with the data, suppose B is 
a one-dimensional array, and that VECMEM is a matrix 
VECMEM(2948,8*NMAX+4) where NMAX is the number of 
available MAX modules. In this case, the operations 
performed by MUNLD can be described in FORTRAN by: 


DO 28 i = 1, NVB 
DO 19 j = 1, NEV 
B((i~-1)*IBV+(j-1)*IBE+1) = VECMEM(IVM+j-1,IVN+i-1) 
1g CONTINUE 
28 CONTINUE 
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MAX vector memories (VECMEM): 


XX 


5.8 


1.80 2.8 3.8 4.8 


KeX 


XX 


X.X 
x.X 
X.X 
X.X 
XX 
XX 


189 18 2.8 3.8 4.8 
1.018 2.08 3.89 4.8 
18 1.8 1.8 2.8 3.8 
1.8 1.8 18 2.8 3.8 
181.86 1.69 18 2.8 
1821.8 1.84 18 2.8 


XX 


x.X 
XX 
X.X 


XX 
XeX 


X.X 
XX 
XX 


xXx.xX 


X.X 
XX 


1.9 


1.89 1.80 1.8 1.8 «1.8 #168 #+1.8, 


208 268. Us 169. - 1.8 128 11.8 1d 


1.0 


3.86 3.9 2.6 2.89 1.8 1.8 1.8 


4.9 4.9 3.8 3.9 2.8 2.8 1.6 1.9 


5-8 5.8 4.6 4.69 3.9 3.6 2.09 2.8 


IERR = 9 


oo E23 


Page K 
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A system with NMAX available MAX modules has 
NVEC = 8 * NMAX + 4 


MAX vector memories, numbered from 1 to NVEC, which are 
partitioned into two banks of NVEC/2 vector memories 
each. Vector memories 1 through NVEC/2 make up one bank, 
and vector memories NVEC/2+1 through NVEC make up the 
other. The two banks of vector memories can be thought 
of as complementary, where vector memories 1 and NVEC/2+1 
are complements, 2 and NVEC/2+2 are complements, and so 
forth. 


Vector memories NVEC/2 and NVEC are not accessible for 
VMSA operations and hence are not used. 


For each vector in A, MVMSA performs the NVB VMSA 
Operations between that vector and the NVB vectors 
residing in one hank of the MAX vector memories, 
using the scale factors contained in C. The NVB 
resultant vectors are written into the complementary 
bank of MAX vector memories. IVN is toggled to point 


to the starting vector memory in the output bank. 
IVN = MOD(IVN+NVEC/2, NVEC) 


The NVB resultant vectors then become the input for the 
VMSA operations with the next vector of A. 


Note that due to the parallel operation of the MAX, all 
NVEC/2 - 1 vector memories in the output bank will be 
overwritten. The unused NVEC/2 - 1 - NVB vector memories 
will contain extraneous results. Refer to EXAMPLE 

for more details. 
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IERR 


i 
wn 


IVM + NEV - 1 greater than 2948. 
IERR = 6 IVN <= 9g. 


IERR 


i) 
~] 


NVB > NVEC/2 - 1 or 
MOD(IVN,NVEC/2) + NVB - 1 > NVEC/2 - 1 


or 
IVN = NVEC/2 or 
IVN = NVEC 


where NVEC = 8*NMAX+4 and NMAX is the 
number of available MAX modules. Refer 
to Section K.2. 


If there are too many or too few formal parameters, then 
IERR is left unchanged. 


For more information on the Matrix Oriented MAX 
routines, refer to Section K.4. 


EXAMPLE: Assume one available MAX module and that the data in A and 
C is stored in column major order (normal FORTRAN). Perform 
the VMSA operations of the rows of a submatrix of A witha 
subset of the MAX vector memories using the scalars 
contained in the columns of a submatrix of C. 


Input matrix A: 


1.9 x.x 1.8 x.x 1.89 x.x 1.8 x.x 1.9 
i a a> a > SER <> > > > o> a oe ae ep am ee 4 
1.0 x.x 2.80 x.x 3.80 x.x 4.8 x.x 5.9 
KOM ee RK ES RK A ce: eX 
18. 86% 248. *.X 3.8 XX 2.8. xex 1.0 
Mieke SK Mike: RR EK OR RR. SX x 
IAE 12 
IAV = 2 
NVA = 3 
Input matrix C: 
isl Rex + Kat 2 xe: ex: 
Mist “HO ee eR CRG: UX XL 
De Re Ke Le KK ex: OS 
MER ASU: OMSK SRS OR KK Sea 
3.0 “«.x x.x O.8 x.x x.x 2.9 
Ki Mae “Oak eae Kak eek “KU 
4.9 x.x x.x 3.8 x.x x.x 3.9 
Sox Ke. HEX ak “KEK. xa XU 
ICE = 2 
Icv = 24 
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ERERKRAEKEE 3 REEEEEKREEE 
. & 2 x 
* MVSMA * —-——— MATRIX VECTOR SCALAR MULTIPLY ADD ——- * MVSMA * 
& . 2 * x 
RRREREKEEE . REEEERAEEE 
PURPOSE: To perform vector scalar multiply add (VSMA) operations 


between a matrix of vectors and a matrix of scalars in 
Main Memory and a matrix of vectors in the MAX vector 
memories. 


CALL FORMAT: CALL MVSMA(A, IAE, IAV, IVM, IVN, C, ICE, ICV, 
NEV, NVA, NVB, IFUN, IERR) 


PARAMETERS : A = Floating-point input matrix of vectors in 
Main memory. 
IAB = Integer input element stride for vectors in A. 
IAV = Integer input element stride between 


vectors in A. 
IVM = Integer input starting element in the MAX 
vector memories. 
IVN = Integer input/output starting vector for the 
input/output vectors in the MAX vector memories. 


C = Floating-point input matrix of scalars. 
ICE = Integer input element stride for C. 
ICV = Integer input element stride between 


vectors in C. 
NEV = Integer input VSMA length. 
NVA = Integer input number of vectors in A to be used. 
NVB = Integer input number of vectors in the MAX 
vector memories to be used as input. Also 
the number of scalars per vector of C to be used. 
Integer input function flag. 
IFUN = 9: r= 1.9 
IFUN = 1: cr = -1.9 
Note that IFUN is a bit-mapped function flag: 
IFUN = 2 is equivalent to IFUN = g, IFUN = 3 
is equivalent to IFUN = 1, etc. 
See DESCRIPTION for the usage of r. 
IERR = Integer input/output error flag. 

See DESCRIPTION for a list of error conditions. 


IFUN 


DESCRIPTION: MVSMA performs VSMA operations of length NEV between 
the NVA vectors in Main Memory defined by A, IAE, and 
IAV with the NVB vectors in the MAX vector memories 
defined by IVM and IVN, using the elements of C in 
Main Memory defined by ICE and ICV as the scale factors. 
The output of the VSMA operations is written into the 
MAX vector memories. 
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The operation of MVSMA can be conveniently described if 
the set of MAX vector memories is considered to be a 
Matrix VECMEM(2948,NVEC). Arbitrary data structures 
can be associated with A and C. For simplicity, assume 
that A and C are also matrices, dimensioned A(IAV,NVA) 
and C(ICV,NVA), respectively. Further, assume that IAE 
and ICE are both equal to one. Given these assumptions, 
the computations performed by MVSMA can be described 

in FORTRAN by: 


DO 38 i = 1, NVA 
IVO = MOD(IVN+NVEC/2, NVEC) 
DO 28 3 = 1, NVB 
DO 18 k = 1, NEV 
VECMEM(IVM+k-1,IVO+j-1) = r * C(j,i) * A(k,i) + 


+ VECMEM ( IVM+k-1, IVN+j-1) 
1g CONTINUE 
29 CONTINUE 
IVN = IVO 
38 CONTINUE 


To illustrate the generality and the flexibility of the 
data structures that can be associated with the data, 
suppose that A and C are one-dimensional arrays. In this 
case, the operations performed by MVSMA can be described 
in FORTRAN by: 


DO 38 i = 1, NVA 
IVO = MOD(IVN+NVEC/2, NVEC) 
DO 29 j = 1, NVB 
DO 19 k = 1, NEV 
VECMEM(IVM+k-1,IVO+j-l) =r * 


+ C((j-1)*ICE+(i-1)*ICV+1) * 
+ A((k-1)*IAE+(i-1)*IAV+1) + 
+ VECMEM( IVM+k~-1, IVN+j-1) 
19 CONTINUE 
2g CONTINUE 
IVN = IVO 
38 CONTINUE 


Summary of error conditions: 
IERR = -3 NVA <= @. 
IERR = -2 NVB <= @. 
IERR ‘ =1 NEV <= @. 
IERR = g No error occurred. Normal completion. 
IERR = 1 No TMRAM on the system. 


IERR = 4 IVM <= Q@. 
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MAX vector memories (VECMEM) 


KeoX 


XX 


1.0 2.8 3.0 4.8 x.x 


XX 


1.9 2.8 x.x 


1.01.8 1.8 1.8 x.x x.x 
1.89 1.8 1.8 1.8 x.x 


1.8 1.8 


X.X 


X.X 


XX 
XX 


XX 


NEV 


Upon return from MVSMA, the MAX vector memories (VECMEM) 


contain: 


X.X 


5.89 5.89 8.8 14.9 
8.9 
yey 11.9 6.9 19.9 24.9 
yey 12.9 7.89 8.8 23.9 


Yay 


X.xX 
XX 
XX 
X.X 
XX 


5.8 6.8 11.0 


-¥ 4.8 
-¥ 


Y 
Sg 


XX 
XX 


5.6 9.9 19.8 


5.8 13.6 
4.9 15.9 
y-y 19.9 7.8 4.817.9 


6.8 5.9 
y-y 8.8 6.2 


xX.X 
X.X 


y-y 13.9 8.8 6.8 23.8 


yey 12.89 8.9 4.9 29.9 


Note that vector memories 1 and 7 contain extraneous results, 
and that vector memories 6 and 12 remain unchanged because 


they are not accessible for VSMA operations. 
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APPENDIX L 


MAX ROUTINES IN ALPHABETICAL ORDER 


NAME DESCRIPTION PAGE 
CMDOT COMPLEX MATRIX DOT PRODUCT K - 186 
CMLOAD COMPLEX MATRIX LOAD K - 194 
CMUNLD COMPLEX MATRIX UNLOAD K - 187 
CMVSMA COMPLEX VECTOR MULTIPLY SCALAR ADD K — 19895 
MDOT MATRIX DOT PRODUCT K =~ 113 
MLOAD MATRIX LOAD Ko = 118 
MUNLD MATRIX UNLOAD K - 121 
MVMSA MATRIX VECTOR MULTIPLY SCALAR ADD K - 124 
MVSMA MATRIX VECTOR SCALAR MULTIPLY ADD K = 125 
PCDOT PARALLEL COMPLEX DOT PRODUCT Roe 13 
PCNV2D PARALLEL 2-D CONVOLUTION AND CORRELATION K-= 17 
PDOT PARALLEL DOT PRODUCT K - 2¢ 
PIDOT PARALLEL INDEXED DOT PRODUCT K - 24 
PILOAD PARALLEL LOAD FOR PIDOT K = 27 
PLDCD PARALLEL COMPLEX LOAD K - 3¢ 
PLOADD PARALLEL LOAD FOR PDOT/PTSLVK K - 34 
PLOADV PARALLEL LOAD FOR PVSMA AND PVMSA K-=- 37 
PLUFAC PARALLEL LU MATRIX FACTORIZATION K - 49 
PLUSLV SOLVER FOR PLUFAC K ~- 42 
PMMUL PARALLEL MATRIX MULTIPLY K - 44 
PMOVE PARALLEL MOVE K - 47 
PSGEFA PARALLEL REAL GENERAL MATRIX FACTOR K - 52 
PTSLVK PARALLEL TRIANGULAR SOLVE KERNEL Ket 32 
PTSOLV PARALLEL TRIANGULAR SOLVE K - 56 
PUNLDD PARALLEL UNLOAD FOR PTSLVK K-59 
PUNLDV PARALLEL UNLOAD FOR PVSMA AND PVMSA K ~ 62 
PVMSA PARALLEL VMSA K - 65 
PVSMA PARALLEL VSMA K - 79 
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APMATH64/MAX KEY WORD INDEX 


This index of APMATH64/MAX routines is sorted by key words that appear 
in each routine title. Each title can contain more than one key word. 
The key words are listed alphabetically to the right of the gap running 
down the center of each page. 


Oo use the key word index, locate a key word that is representative of 
the desired APMATH64/MAX function. Applicable APMATH64/MAX routine 
names and titles can be found on the same line with each occurrence of 
the key word. The routine name appears in brackets ([{ ])}. The routine 
title immediately follows the routine name and continues on the other 
side of the gap when necessary. The ellipsis (...) is placed directly 
after the last word in the title if the line wraps around. The page 
where a particular routine is documented can be found in Appendix L. 


[PCNV2D] PARALLEL 

COMPLEX VECTOR MULTIPLY SCALAR 
MATRIX VECTOR MULTIPLY SCALAR 
MATRIX VECTOR SCALAR MULTIPLY 
{PCDOT] PARALLEL 
{PLDCD] PARALLEL 
[CMDOT ] 

[ CMLOAD ] 

[ CMUNLD ] 
ADD...([CMVSMA] 
CNV2D] PARALLEL 2-D 
2-D CONVOLUTION AND 
[CMDOT] COMPLEX MATRIX 

{MDOT] MATRIX 

(PCDOT] PARALLEL COMPLEX 
[PDOT] PARALLEL 

(PIDOT] PARALLEL INDEXED 
PARALLEL REAL GENERAL MATRIX 
{PLUFAC] PARALLEL LU MATRIX 
(PSGEFA] PARALLEL REAL 

{[PIDOT] PARALLEL 

PARALLEL TRIANGULAR SOLVE 
[CMLOAD] COMPLEX MATRIX 
{[MLOAD] MATRIX 

{PLDCD] PARALLEL COMPLEX 
[PLOADD] PARALLEL 

[PILOAD] PARALLEL 

[PLOADV] PARALLEL 

[PLUFAC] PARALLEL 

[CMDOT] COMPLEX 

[MDOT ] 

[PSGEFA] PARALLEL REAL GENERAL 
(PLUFAC] PARALLEL LU 

[CMLOAD] COMPLEX 
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2-D CONVOLUTION AND CORRELATION 
ADD... {CMVSMA] 

ADD... {[MVMSA] 

ADD... [MVSMA] 

COMPLEX DOT PRODUCT 

COMPLEX LOAD 

COMPLEX MATRIX DOT PRODUCT 
COMPLEX MATRIX LOAD 

COMPLEX MATRIX UNLOAD 


_COMPLEX VECTOR MULTIPLY SCALAR 


CONVOLUTION AND CORRELATION 
CORRELATION. ..[PCNV2D] 
DOT PRODUCT 

DOT PRODUCT 

DOT PRODUCT 

DOT PRODUCT 

DOT PRODUCT 

FACTOR... [PSGEFA ] 
FACTORIZATION 

GENERAL MATRIX FACTOR 
INDEXED DOT PRODUCT 
KERNEL... [ PTSLVK ] 

LOAD 

LOAD 

LOAD 

LOAD FOR PDOT/PTSLVK 
LOAD FOR PIDOT 

LOAD FOR PVSMA AND PVMSA 
LU MATRIX FACTORIZATION 
MATRIX DOT PRODUCT 
MATRIX DOT PRODUCT 
MATRIX FACTOR 

MATRIX FACTORIZATION 
MATRIX LOAD 


INDEX 


etn, 


[PUNLDV] PARALLEL 
{CMVSMA] COMPLEX 
[MVMSA] MATRIX 
[MVSMA] MATRIX 
[PVMSA] PARALLEL 
{PVSMA] PARALLEL 
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UNLOAD FOR PVSMA AND PVMSA 
VECTOR MULTIPLY SCALAR ADD 
VECTOR MULTIPLY SCALAR ADD 
VECTOR SCALAR MULTIPLY ADD 
VMSA 
VSMA 


INDEX 


* 
wth 
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